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1*  INTRODUCTION 


The  calculation  of  distribution  functions  is  one  of 
important  tasks  in  mathematical  statistics o Of  particular 
importance  are  the  sampling  distributions  of  statistics  used  in 
testing  hypotheses®  These  require  the  determination  of  the 
distribution  of  a function  of5  say  ns  random  variables,,  each  of  which 
has  the  same  basic  distribution®  The  sampling  distribution  can 
be  expressed  in  the  form  of  a definite  integral  and5  if  this 
integral  can  be  evaluated  analytic  ally  5 it  is  only  necessary  to 
prepare  a table  of  the  resulting  function,  if  not,,  the  table 
may  be  obtained  by  numerical  integration® 

An  empirical  method  of  determining  the  sampling  distribution 
suggests  itself  immediately  from  the.  formulation  of  the  problem® 

This  method  consists  of  taking,,  say  N5  samples,,  each  consisting 
of  n observations  on  the  random  variable  and  calculating  the 
value  of  the  statistic  for  each  sample®  The  "erpirical" 
distribution  can  then  be  obtained  by  arranging  the  values  of  the 
statistic  in  order  of  size  and  counting  the  proportion  that 
are  less  than^  or  equal  to  a given  value®  (In  practice 5 it  is 
usually  desirable  to  apply  a smoothing  process  to  the  resulting 
distribution) ® 

The  empirical  method  has  been  used  since  it  was  introduced 
by  Student  [6]5  without  the  indication  that  it  could  be  considered 
as  a way  of  getting  approximate  values  for  a definite  integral® 

In  recent  years 9 much  attention  lias  been  directed  towards  the 
numerical  solution  of  difficult  mathematical  problems®  It  has 
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been  found  that  probability  interpretations  may  be  used^  not  only 
to  evaluate  definite  integrals,,  but  also  to  invert  matrices  and 
to  solve  differential  equations©  The  general  class  of  methods 
has  been  given  the  code  name  of  Monte  Carlo  and  now  has  a 
fairly  extensive  literature 9 (see  for  example,,  Curtiss  [2]s 
Kahn  [ 3 ]9  and  The  Monte  Carlo  Method  [8])c 

An  outstanding  contribution  of  Monte  Carlo  theory  is  the 
use  of  so-called  "importance  sampling"©  The  concept  will  be 
developed  in  Section  3 9 the  word  "importance however,,  will  be 
avoided  since  it  is  not  particularly  appropriate  © As  far  as  we 
know^  no  one  has  yet  utilized  this  concept  in  calculating  sampling 
distributions©  In  this  papery  some  of  the  theoretical  aspects 
of  such  a procedure  will  be  examined,,  with  particular  attention 
to  the  practical  aspects  of  using  the  method  on  high-speed 
electronic  calculators© 

It  might  be  mentioned  here  that  empirical  sampling^,  because 
of  probability  considerations^  cannot^  with  a practical  number 
of  samples,,  give  results  which  are  accurate  to  many  significant 
figures©  Even  with  the  fastest  electronic  calculators  now  being 
built5  statisticans*  by  virtue  of  economic  necessity,,  may  have 
to  be  satisfied  with  thousands  of  sanples  instead  of  millions© 
Nevertheless^  there  are  many  practical  problems  which  could  be 
solved  b3^  results  obtainable  most  economically  by  empirical 
sampling©  For  example ^ one  such  problem  is  to  determine  which 
of  a number  of  non=parametric  tests  is  most  powerful  against  a 
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specified  class  of  alternatives © A problem  of  this  type  is  being 
prepared  for  computation  by  the  SWAC  (National  Bureau  of  Standards 
Western  Automatic  Computer) © 

20  STATEMENT  OF  PROBLEM  AND  NOTATION 
The  basic  model  consists  of  j 

(i)  A random  variable  x which  has  a density  function  f(x) 

(ii)  A sample  size  n 

(iii)  A statistic  s which  is  a function  of  n independent 
observations  on  x» 

In  any  practical  problem,,  various  values  of  n and  various  function- 
al forms  of  s may  arise*  and  this  may  affect  the  method  of  solution 
but  for  the  present  only  one  value  of  n and  one  form  of  s will  be 
used© 

The  problem  is  to  calculate  (and  tabulate)  the  sampling 
distribution  of  s*  i0e0  the  probability  that  the  statistic  will 
be  less  than  or  equal  to  some  given  value©  Let  H(s)  denote  this 
function  of  s*  then 


where  the  integration  is  taken  over  the  region  in  the  n dimensional 
sample  space  in  which 


In  accordance  with  standard  usage*  capital  letters  xd.ll  in- 
dicate observed  values*  a dash  beneath  a letter  denotes  a vector* 
and  the  n dimensional  space  of  possible  sample  values  will  be 


H(s)  " Prob 


' 


- 
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denoted  by  R « In  addition  f(x)  will  be  written  for  f(x,5  x « x 
dx  for  dx5  ®e®  dx^,  a single  integral  sign  will  be  used  for  n- 
fold  integration.  The  range  of  summation  will  be  indicated  by  the 
subscript,,  summation  over  individuals  in  a sample  will  be  for  j 
from  1 to  n and  summation  over  samples  will  be  for  i from  1 to  N. 
In  addition  it  will  be  convenient  to  defines 


A(s)  s s s(x)  4 sj 


~ set  in  R in  which  s(x)  ~ s, 

n — 

[x)  3 1 if  s(x)  ~ s 
“ 0 otherwise 


i.e  if  x C A(s) 


i.e  if  x c P“r  “ ^-(s' 


(2.2) 


(2.3) 


g(x)  " any  n dimensional  density  function 
i.e  g(x)  ^ 0 foi  all  x in  R j 

v(x)  " 


(2.U) 


R: 


g(x)  dx  3 1 


n 


w(x) 


f(x) 


g(x) 


With  this  notation  (2.1)  becomes 


H(s) 


f ( x)  dx  - t(x)  f(x)  dx 


(2.3) 

(2.6) 

(2.7) 


n 


3.  SAMPLING  MODELS 


Equation  (2.7)  can  be  regarded  either  as  giving  the  -value  of 
H(s)  as  a definite  integral  or  as  stating  that  if  a random  variable 
x has  a density  function  f(x)  then % 


E £t(x)J  = H(s) 

since  the  expectation  operator  is  defined  by  just  such  an  equation 
(Cramer  [1],  p.  170) » This  means  that  the  average  value  of  t(X)  in 
N observations  on  x will  tend  to  H(s)  as  N increases,  i.e.  the  variance 
or  the  dispersion  of  these  average  values  around  H(s)  is  a decreasing 
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function  of  N.  Therefore  one  way  of  estimating  H(s)  is  to  take  many 
samples  (according  to  the  basic  model)  and  calculate  the  average  value 
of  t(X). 

For  this  case  the  behaviour  of  the  estimator  is  well-known  (and 
is  furthermore  independent  of  f(x)  ) since  t(X.)  is  a binomial  variate \ 

3-  ©0  v 

t(X)  as  i with  probability  H(s) 

- 0 with  probability  1 - H(s) 
and  therefore  if  follows  that  the  estimator 


P1  ^ 38 

N 2 

t(x±) 

(3.1) 

E (Pl  ( 

s)J 

® H(s) 

(3.2) 

N Var  pn 

(s) 

■ H(S')  - [H(s ) ] 2 

(3.3) 

This  sampling  model  has  consisted  of  two  steps? 
lo  The  space  is  divided  into  two  mutually  exclusive  sets  of  points,, 
A(s)  and  - A(s)» 

2*  Points  are  selected  from  the  probability  of  a point  coming 
from  A(s)  is  H(s)  and  the  probability  of  it  coming  from  - A(s)  is 

1 - H(s.)«  An  estimate  of  H(s)  therefore  is  the  proportion  of  points 
coming  from  A(s)*  The  important  point  here  is  that  the  first  part, 
is  a matter  of  geometry  only  and  is  independent  of  probability 
considerations  » 

From  a consideration  of  step  29  it  is  clear  that  the  probability 
of  a point  coming  from  A(s)  does  not  have  to  be  H(s)  and  it  might  be 
possible  to  reduce  the  variance  of  the  estimate  of  H(s)  by  changing 


it*  The  mechanics  of  the  procedure  are  apparent  if  the  integrand 
in  (2e7)  is  multiplied  and  divided  by  a density  function  g(x)<>  The 
equation  becomes 

H(s)  “ Jb  “7*4  g(x)  * 

n g(x) 

(where  v(x)  is  defined  by  (2.5))  and  is  interpreted  as  stating  that 

E \ v(x 

where  x has  a density  function  g(x)c  Nows  in  step  2S  the  probability 
that  a point  comes  from  A(s)  is 

s)  g(x)  dx 

and  an  estimate  of  H(s)  is  the  average  of  v(X^)e  The  basis  of  "import- 
ance” sampling  is  to  choose  a g(x)^  such  that  regions  in  which  have 
a large  contribution  to  the  variance  of  the  estimate  are  sampled 
more  heavily  than  other  regions  (since  the  variance  is  a decreasing 
function  of  N)  and  therefore  the  overall  variance  of  the  estimate 
of  H(s)  will  be  reduced* 

The  word  '’importance**  will  not  be  used  in  this  paper5  instead 
the  two  sampling  models  will  be  referred  to  as  Model  I and  Model  II® 
Both  of  these  models  will  give  an  empirical  answer  to  the  problem 
outlined  in  section  2 and  differ  in  the  distribution  of  the  random 
variable  which  is  sampled*  These  distributions  are  as  follows s 
Model  Ig  The  random  variable  sampled  has  density  f(x)e 
Model  II  g The  random  variable  sampled  has  density  g(x)  4 f(x)  <> 
Ilag  g(x)  ■ g(xi)  g(x2)  coe  g(x  ) 
lib  g g(x)  is  not  factorable  as  in  Ila.® 

Any  statement  made  about  Model  II  without  referring  to  a or  b5  applies 


v(x)  g(x)  dx 


(3.U) 


to  both, 
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It  may  be  noted  that  Model  II  sampling  bears  a certain  analogy 
to  the  stratified  sampling  used  in  surveys 0 A survey^  in  its  simplest 
form,,  consists  of  selecting  a sample  of  m individuals  from  a population 
of  M individuals  and  observing  the  value  of  a characteristic  on  each 
of  the  selected  individuals o The  average  value  obtained  in  this 
way  is  an  estimate  of  the  average  of  that  characteristic  for  the 
population o If  the  population  can  be  divided  into  k strata,,  in  each 
of  which  the  characteristic  has  a smaller  variance  than  in  the  whole 
population,,  an  estimate  of  the  mean  with  smaller  variance  can  some= 
times  be  obtained  by  allocating  different  numbers  of  samples  to  the 
different  strata e Formulas  can  be  determined  for  the  optimum  sample 
sizes  for  each  strata  on  the  basis  of  strata  variances  and  number 
of  individuals  in  each  strata® 

In  Model  II  sampling  the  population  consists  of  the  points  x of 
the  n dimensional  sample  space  and  the  characteristic  is  s(x)0  A 
strata  consists  of  points  which  have  their  values  of  s in  a certain 
intervale  The  allocation  of  samples  to  the  different  strata  is 
accomplished  by  choosing  a g(x)e 

lu  ESTIMATES  OF  H(s) 

The  problem,,  under  Model  II  sampling  might  be  looked  at  as  a 
selection  of  a g(x)5  such  that  the  variance  of  some  estimator  of 
H(s)  is  a minimumio  The  minimium  turns  out  to  be  zero5  as  has  been 
shown  by  Kahn  [3]o  This  trivial  solution  arises  because  the  problem 
is5  incorrectly^  being  considered  as  one  in  statistical  inference 0 
In  statistical  inference  a sample  is  taken  from  a distribution  which 
is  a function  of  an  unknown  parameter  and  it  is  desired  to  estimate 


■ 


- 
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this  unknown  value  from  information  obtained  from  the  sample  e In  the 
present  problem  the  distribution  from  which  a sample  is  taken  is 
completely  specified  and  from  this  s ample  an  "estimate”  of  H(s)  is 
desiredo  However*  H(s)  is  not  unknown  because  its  value  is  given  by 
the  integral  (2ol)«  Quite  naturally,,  therefore * the  trivial  solution 
states  that  an  estimate  of  H(s)  with  zero  variance  can  be  obtained 
by  evaluating  the  integrate  The  word  "estimate"  will  be  used  in  this 
paper  in  the  special  sense  that  the  unknown  quantity  is  not  a parameter 
of  the  distribution  which  is  being  sampled,, 

For  sampling  under  Model  I*  the  estimator  (3d)  is  the  only 
reasonable  one  to  use,  Its  variance  (3*3)  and*  in  fact*  its  sampling 
distribution  are  known  and  do  not  depend  on  the  functional  form  of 
f(x)0  Under  Model  II  the  problem  is  to  estimate  the  expected  value 
of  v(x)e  This  is  a much  more  complex  problem  because  the  form  of  the 
distribution  of  v(x)  may  depend*  not  only  on  g(x)*  but  also  on  f(x)c 


and  furthermore*  this  distribution  may  be  such  that  the  variance  may 

not  be  a suitable  criterion  of  desirability „ Therefore  the  choice 

of  estimator  will  depend  on  the  particular  situation,,  For  the  general 

discussion*  the  variance  will  be  used  and  two  classes  of  estimators 

having  general  applicability  will  be  considered*  first  those  depending  on 
the  sample  values  and  second*  those  depending  on  the  ordered  observations  only* 

Of  the  first  class*  the  most  natural  one  is  the  sample  average* 
which  has  already  been  mentioned  in  the  last  sections  Let 


Hence*  the  average  value  may  not  be  a desirable  estimator  of  E 


This  estimator  has  the  unfortunate  property  that  p^Cs'")  where 
s*  is  the  largest  observed  value  of  s*  is  in  general  not  equal  to 


10 


unity,  i0ec  P2(s)  is  not  a distribution  function  in  the  usual  sense e 
The  estimator  may  be  normalized  to  make  it  run  from  zero  to  one  by 


dividing  by  p^s*')©  This  leads  us  to  consider  the  "ratio*1  estimator g> 


2 v(Xi)  - 

f \ > v 

p3(s)  ■ —(D  ■ 5 


(U.2) 


where  w(x)  is  defined  by  (2 ©6)© 

Under  the  class  of  estimators  depending  on  the  ordered  values 
of  v(X^) ^ estimators  such  as  the  median^  etc©,,  might  be  considered,, 
The  problem  of  forcing  the  empirical  sampling  distribution  to  cover 
the  interval  from  zero  to  one  would  arise  here  too0  Investigation 
of  this  class  is  of  secondary  importance^  because  the  time  and 
memory  space  required  to  rank  the  s(X^)  increases  too  rapidly  as  N 
Increases  to  make  the  use  of  these  order  statistics  attractive  for 
high  speed  computing  machines © 


5o  COMPARISON  OF  VARIANCE  OF  ESTIMATORS 


p^(s)  obtained  under  Model  I has  expectation  and  variance  as  given 
by  (3o2)  and  (3  ©3)  respectively © For  any  particular  value  of  s5 
confidence  limits  for  H(s)  may  be  calculated  by  exact  binomial  theory 
if  N is  small  or  the  asymptotic  normality  of  p^(s)  if  N is  large© 
Confidence  limits  for  the  entire  function  H(s)  of  the  form 
p1(s)  - 2 N 2 < H(s)c  p?(s)  + A N 3 

may  be  obtained  by  the  method  given  by  Kolmogorov  [U]  if  N is  large 
or  by  tables  given  by  Massey  [$]  if  N is  small0  Wald  and  Wolfowitz  [7] 
have  given  a method  for  calculating  confidence  limits  of  other  forms© 
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p0(s)  is  defined  by  (iui)„  Since 


and  var  v(x) 


A 


E fv(*)}  s H(s) 

[f(x)]2  [g(x)]  1 dx  - 


[ H( s ) ] ‘ 


(5.1) 


it  follows  that 


- H(s) 


N var  Pg ( s ' 


[f(x)]^  [ g( x) ] 1 dx  - [H(s)]‘ 


(5.2) 

(5.3) 


ece  Var,,  Pp(s)  - 0 If 

g(x)  * i ( x)  [ H( s ) ] 1 if  xC  A(s)  (5.U) 

e 0 otherwise 

This  is  the  zero  variance  solution  for  g(x)  mentioned  in  section  ho 
It  is  not  of  much  use  in  actual  problems  because  it  requires  a 
knowledge  of  H(s)j  applies  to  only  one  value  of  s and  furthermore 
is  defined  in  terms  of  the  set  A(s)„  If  Model  Ila  were  used5  where 
gU)  - 7Tg(Xj)  , g(x)  could  not  be  defined.  In  general,  in  ten* 
of  A(s)j,  unless  the  statistic  were  of  a similar  form^  i8ee 
s(x)  - TTs(x  ). 

Next  consider  the  more  modest  requirement  of  choosing  g(x)  so 
that  var  p^(s)  > var  p^s),,  i®e« 

f(x)  d2E  > Sx  1 ° (5.5) 


A sufficient  condition  for  this  inequality  is 

g(x) > f(x)  for  all  x € A(s)  „ (5.6) 


This  however  is  not  a necessary  condition 
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Since  p^(s)  is  a ratio  of  two  random  variables  v(x)  and  w(x) 


its  distribution,,  or  even  its  ejected  value  and  variance  cannot 9 
in  general,,  be  calculated  explicitly,,  The  best  we  can  do  is  to  follow 
the  procedure  usually  used  in  a situation  of  this  kind  and  consider 
the  expansion  of  a function  of  two  variable s5  v s and  w in  a Taylor 


sides  of  the  equation  gives  the  expected  value  of  the  ratio©  A 
similar  series  can  be  obtained  for  the  variance^  etc,  If  an  upper 
bound  can  be  found  for  the  sum  of  the  series  beginning  with  the  (m+l)th 
term  the  first  m terms  can  be  used  as  an  approximation  with  a known 
maximum  error.  This  is  difficult  to  do  in  the  general  case  but  in 
any  particular  situation  it  may  be  possible® 

In  the  general  case,,  if  the  terms  are  decreasing  functions  of 
N*  as  N becomes  large,,  all  terms  except  the  first  in  the  series  will 
become  negligible.  These  statements  are  made  precise  in  the  theorem 
given  by  Cramer  [1]  pe  355  and  366©  Here  we  need  only  the  expected 
value  and  variance  of  which  the  first  terms  are 


series  about  the  point  E(v)5  E(w)„  Taking  expected  values  of  both 


E p^(s)  ~ H 
x „-l 


(5o7) 


var  p ^ ( s ) N 


It  will  be  noticed  that  N var  v is  the  var  p?(s)  therefore 
p^(s)  can  have  a lower  variance  than  p^s)  only  if  cov  (v(x),  w(x)) 
is  positive.  Since 


cov  (v(x),  w(x)) 


[f(x)]2  [ g( x) ] 1 dx  - H(s)  (5.9) 


var  p^(s)  < var  p0(s)  only  if 


(5.10) 
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This  is  the  opposite  inequality  to  the  one  given  in  (£•£)  Tor 
var  Pp(s)  < var  p^(s)„  The  two  methods  of  estimation  are  to  some 
extent  complementary, 

6,  COMPUTATION 

Before  considering  some  of  the  applications  of  Model  II  sampling 
it  may  be  worthwhile  to  outline  the  changes  in  computational  pro- 
cedure that  would  be  necessary.  The  computations  carried  out  by 
a high  speed  machine  in  computing  an  empirical  sampling  distribution 
under  Model  I may  be  divided  into  three  steps  t 

1,  The  generation  and  testing  of  random  numbers  and  the  trans- 
formation of  these  random  numbers  to  observations  from  f(x), 

2,  The  calculation  of  the  statistic  from  n of  these  observations, 

3,  The  preparation  of  an  empirical  distribution  of  N values  of  the 
statistic.  This  could  take  the  form  of  a printed  list  of  the  N 
values  (in  order  of  size,,  if  possible)  or  a printed  frequency 
distribution^  each  frequency  being  a count  of  the  number  of  values 
that  fell  in  a certain  interval  (the  intervals  having  been  deter- 
mined  in  advance). 

Under  Model  II<,  step  1 would  contain  the  transformation  of  random 
numbers  to  observations  from  g(x) » Step  2 would  be  unchanged. 

However^  in  step  3S  it  is  now  necessary  to  record  not  only  the  value 
of  the  statistic  but  also  the  value  of  v(X)  for  each  sample.  The 
empirical  distribution  could  take  the  form  of  a printed  list  of 
N values  (in  order  of  size^,  if  possible)  together  with  the  value  of 
v(X)  for  that  sample  or  a printed  distribution,,  each  frequency  this 
time  being  the  sum  of  the  v(X)  cs  for  all,  samples  having  the  value 
of  the  statistic  fall  in  that  interval* 


The  major  change  arises  in  the  computation  of  the  variance  of 
the  estimates,,  Under  Model  I,  the  var  p (s),  may  be  estimated 
from  p (s)  (1  - p^(s))  for  which  nothing  is  required  except  p^ 
itself „ Var  p^Cs ) however  must  be  estimated  from  the  variance 
of  the  v(x) 8s  and  if  this  cannot  be  done  by  evaluating  the  integral 
(5 el)  theoretically,  it  will  be  necessary  to  store  the  sum  of  [v(X)] 
for  all  samples  having  values  falling  in  each  interval  in  the 
frequency  distribution,,  p0(s)  presents  an  even  more  difficult 
problem  since^in  general,  no  exact  forraula  for  its  variance  exists  „ 
One  estimate  of  var  p ^ ( s ) can  be  obtained  by  substituting  estimates 
from  the  sample  in  (5o8)0  Another  estimate  can  be  obtained  by- 
dividing  the  N samples  into  k groups  of  N samples  each,  estimating 
p^  from  each  group  and  calculating  the  variance  of  the  k estimates,, 
If  the  computations  are  carried  out  on  IBM  machines,  the 
calculation  of  v(X)  may  be  time-consuming.  Prof . J.  ¥„  Tukey  has 
suggested  that  the  work  could  be  simplified  by  dividing  the 
x axis  into  a finite  number  of  intervals  and  in  each  one  choosing 
g(x)  so  that  [f(x)]  [g(x)]  13  a1  where  a is  some  constant  and 

i - 0,  + 1,  + 2,  + etc  o Then  v(X)  can  be  calculated  for  each 
sa.mple  merely  by  summing  the  i8s  that  appear  in  that  sample  and 
raising  a to  that  power 0 This  procedure  greatly  simplifies  the 
calculations  but  leads  to  estimates  with  large  variance  unless  a 


is  close  to  unity® 


15 


? „ APPLICATIONS 

It  will  be  advantageous  to  use  Model  II  sampling  only  if  it 
reduces  the  total  cost 3 where  the  cost  may  be  determined  either  by 
trying  to  estimate  H(s)  with  minimum  cost  for  a certain  amount  of 
precision  or  with  maximum  precision  for  a given  cost  0 Since  a 
problem  of  this  type  is  relatively  easy  to  program  for  a high  speed 
computer^  the  major  expense  in  an  empirical  sampling  study  will  be 
the  cost  of  time  of  the  computer  <,  The  amount  of  computing  time 
may  be  decreased  either  by  decreasing  the  number  of  samples  required 
or  by  decreasing  the  number  of  computations  required  for  each 
sample  © 

If  Model  I is  used  the  number  of  samples  required  can  be 
determined  by  fixing  the  value  of  II(s)  and  var  p^(s)©  There 
may  exist  a g(x)  such  that  if  it  were  used  with  Model  II  var  p^Cs) 
or  var  p^(s)  would  be  substantially  lower  and  hence  the  number  of 
samples  required  would  be  sufficiently  smaller  to  more  than  offset 
the  increased  amount  of  computing 0 Another  case  in  which  Model  II 
could  reduce  the  number  the  samples  is  where  the  sampling  is  being 
carried  out  mainly  to  determine  the  "tails'®  of  the  distribution 
of  s.1000  samples  under  Model  I will  not  give  much  information  about 
the  value  of  sq  for  which  H(s)  - o9999»  Under  Model  II  a g(x) 
could  be  selected  such  that5  sa y5  an  average  of  10  out  of  every 
1000  samples  would  have  s ^ s^  and  hence  provide  an  estimate  of 
H(so)„ 

Model  II  sampling  could  also  be  used  to  decrease  the  computing 
time  required  for  each  sample „ The  first  step  in  the  computation 
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consists  of  transforming  uniform  variates  to  observations  from  f(x)0 
In  some  cases  this  involves  considerable  computation  and  there  may 
exist  a g(x)  such  that  it  is  quicker  to  transform  the  uniform  variates 
to  observations  from  g(x)  and  compute  v (x) e Sometimes  the  dis- 
tribution of  a statistic  is  required  for  a set  of  functions 
f^(x)5  f^(x)5  °°°  o Instead  of  using  each  of  these  in  Model  I 
sampling  it  may  be  advisable  to  select  one  of  them  as  g(x)  and 
use  Model  II,,  particularly  if  it  is  desirable  and/or  permissable 
to  use  the  same  set  of  uniform  variates  for  each  of  the  functions  0 
Under  this  procedure,,  the  value  of  the  statistic  would,  have  to 
be  calculated  only  once  for  each  sample  instead  of  once  for  each  f^(x) 
and  this  could  be  a great  saving,,  if5  for  example  the  calculation 
of  the  statistic  involved  the  inversion  of  a matrixc 

In  using  Model  II  sampling,,  it  is  therefore  necessary  to 
determine  g(x)  in  a way  that  will  achieve  the  desired  effect a 
Unfortunately  this  is  a difficult  tiling  to  do.  Equation  (5*5) 
and  (5.10)  give  conditions  under  which  estimates  under  Model  II 
sampling  may  have  a lower  variance  than  estimates  under  Model  I* 

As  methods  of  determining  in  advance  which  g(x)  should  be  used 
they  are  not  much  help  since  the  integrals  involved  will  generally 
be  as  difficult  to  evaluate  as  the  original  one„  Therefore  the 
answer  must  come  largely  from  intuition  and  empirical  work  and  in 
difficult  cases  mainly  from  the  latter e 

8 c NUMERICAL  EXAMPLE 

The  following  example  has  been  chosen  because  the  expressions 
for  the  variances  of  p2(s)  and  p^(s)  can  be  evaluated  analytically e 
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Suppose  the  sampling  distribution  of  the  sum  of  square  of  n standard- 
ized normal  deviates  is  required  and  that  Model  II.  sampling  will  be 

used  with  g(x)  a normal  distribution  with  mean  zero  and  variance 
2 

<y~  e Table  I and  II  give  the  variance  of  p^  and  the  asymptotic 

2 

variance  of  p^  for  different  c?~  and  H(s)|  for  n - 1 and  9 
respectively 0 (The  numbers  given  are  actually  N times  the  variance), 


These  values  were  calculated  as  follows 
1 


f(x) 

g(x) 


’X2/2 


1 -X2/  2 G-2 


Cr 


V2T p 


* (ct-ct-1) 


2 / "2 
-X  /2  or  * 


O' 


V2lr 


where  cr 


*2 


o~ 

2<y2-l 


[f(x)]‘ 

g(x) 


(o~ 


a~ 


Vn 


n 


VcrV2^ 


~2x . / 2 o-‘ 
e j ' 


s(x)  - 2x, 


A(s)  - ^Rn:  s < sHJ 
where  H - H(s)  Frobc  ) s < s 


H 


f(x)  dx 


A(s) 


K (s)ds 
n 


where  K (s)  is  the  Chi-Square  distribution  with  n degrees  of 


freedom®  Therefore 


[ f (x) ] 2 [ g(x) ] 1 dx  « {cr  cr  )n 


A(s) 


K (s)  ds 
n'  ' 


and  this  integral  can  be  evaluated  directly  from  table  of  the  Chi- 
Square  distribution®. 
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The  variance  under  Model  I is  given  by  cr 


It  may  be 


noticed  that  if  o~  < 1^  p^Cs)  will  sometimes  give  a smaller 
2 

variance 5 if  cr  j \}  p^(s)  will  sometimes  give  a smaller  variance, 
For  n - 9 and  H = o80  all  the  variances  are  given  to  indicate  how 
rapidly  the  variances  can  increase  if  an  inappropriate  estimator 
is  usedo 


April  1$9  1952' 
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TABLE  I 


Comparison  of  Variances  of  Estimators 
under  Models  I and  II,  for  n = 1 


H - 

o 

CO 

« 

H - 

.90 

H - 

.95 

2 

o~ 

Var 

Var" 

Var  P^ 

Var"  PQ 

Var  P^ 

Var"  p^ 

o.95 

.01+8 

•Oij.7 

.0871 

o.6o 

.085 

.072 

.0929 

0.70 

.107 

.060 

.0)419 

0.80 

.117 

•06U 

.0378 

0.90 

.138 

.075 

.0385 

1.00 

.160 

.160 

.090 

.090 

.Oh75 

.0575 

1.25 

.139 

.068 

.0318 

i.5o 

.131 

.060 

.0252 

2:.oo 

.127 

.052 

.0197 

3.00 

.133 

.050 

.0171 

U.oo 

.051 

.0167 

5.oo 

O 

a 

.0169 

"Asymptotic  Variance 
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TABLE  II 


Comparison  of  Variances  of  Estimators 
under  Models  I and  II,  for  n = 9 


H - 

o80 

H - 

•90 

H - 

•95 

2 

s/ 

x/_ 

o~ 

Var 

Var  ' P^ 

Var  P2 

Var  P^ 

Var  P^ 

Var  P, 

0„60 

068U 

8o222 

lo306 

2*218 

0o7$ 

•177 

•597 

• 265 

•371 

0o80 

•135 

o391 

•1 59 

•191 

0o90 

oil  8 

0222 

•057 

•059 

IcOO 

Ol6o 

cl60 

o090 

•090 

o0U8 

•0U8 

1.25 

• 610 

»136 

•059 

•021; 

x«5o 

•9h7 

•119 

•0U6 

•017 

2o00 

2o896 

•2lU 

•062 

•018 

“Asymptotic  Variance 
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